CART classification of human 5' UTR sequences.

نویسندگان

  • R V Davuluri
  • Y Suzuki
  • S Sugano
  • M Q Zhang
چکیده

A nonredundant database of 2312 full-length human 5'-untranslated regions (UTRs) was carefully prepared using state-of-the-art experimental and computational technologies. A comprehensive computational analysis of this data was conducted for characterizing the 5' UTR features. Classification and regression tree (CART) analysis was used to classify the data into three distinct classes. Class I consists of mRNAs that are believed to be poorly translated with long 5' UTRs filled with potential inhibitory features. Class II consists of terminal oligopyrimidine tract (TOP) mRNAs that are regulated in a growth-dependent manner, and class III consists of mRNAs with favorable 5' UTR features that may help efficient translation. The most accurate tree we found has 92.5% classification accuracy as estimated by cross validation. The classification model included the presence of TOP, a secondary structure, 5' UTR length, and the presence of upstream AUGs (uAUGs) as the most relevant variables. The present classification and characterization of the 5' UTRs provide precious information for better understanding the translational regulation of human mRNAs. Furthermore, this database and classification can help people build better computational models for predicting the 5'-terminal exon and separating the 5' UTR from the coding region.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Factors Influencing Drug Injection History among Prisoners: A Comparison between Classification and Regression Trees and Logistic Regression Analysis

Background: Due to the importance of medical studies, researchers of this field should be familiar with various types of statistical analyses to select the most appropriate method based on the characteristics of their data sets. Classification and regression trees (CARTs) can be as complementary to regression models. We compared the performance of a logistic regression model and a CART in predi...

متن کامل

Potential roles of 5´ UTR and 3´ UTR regions in post-trans-criptional regulation of mouse Oct4 gene in BMSC and P19 cells

Objective(s):OCT4 is a transcription factor required for pluripotency during early embryogenesis and the maintenance of identity of embryonic stem cells and pluripotent cells. Therefore, the effective expression regulation of this gene is highly critical. UTR regions are of great significance to gene regulation. In this study, we aimed to investigate the potential regulatory role played by 5´UT...

متن کامل

Alternative 5'-untranslated regions of mouse GH receptor/binding protein messenger RNA are derived from sequences adjacent to the major L2 promoter.

Heterogeneity of 5' untranslated region (5'UTR) sequences is a common feature of growth hormone receptor/binding protein (GHR/BP) mRNA from a number of species. Two major 5'UTR sequences (designated L1 and L2 in the mouse) have been cloned from rodents, ruminants and primates, and are known to correspond to transcripts generated from independently regulated promoters. A variable number of other...

متن کامل

UTR Reconstruction and Analysis Using Genomically Aligned EST Sequences

Untranslated regions (UTR) play important roles in the posttranscriptional regulation of mRNA processing. There is a wealth of UTR-related information to be mined from the rapidly accumulating EST collections. A computational tool, UTR-extender, has been developed to infer UTR sequences from genomically aligned ESTs. It can completely and accurately reconstruct 72% of the 3' UTRs and 15% of the...

متن کامل

Human cellular CYBA UTR sequences increase mRNA translation without affecting the half-life of recombinant RNA transcripts

Modified nucleotide chemistries that increase the half-life (T1/2) of transfected recombinant mRNA and the use of non-native 5'- and 3'-untranslated region (UTR) sequences that enhance protein translation are advancing the prospects of transcript therapy. To this end, a set of UTR sequences that are present in mRNAs with long cellular T1/2 were synthesized and cloned as five different recombina...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Genome research

دوره 10 11  شماره 

صفحات  -

تاریخ انتشار 2000